Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

152 PART 4 Comparing Groups

In this case, because the p value is greater than 0.05, equal variances can be

assumed, and these data would qualify for the classic Student t test. As described

earlier, R gets around this by always using the Welch’s t test, which accommo-

dates both unequal and equal variances.

Assessing the ANOVA

In this section, we present the basic concepts underlying the analysis of variance

(ANOVA), which compares the means of three or more groups. We also describe

some of the more popular post-hoc tests used to follow a statistically significant

ANOVA. Finally, we show you how to run commands to execute an ANOVA and

post-hoc tests in R, and interpret the output.

Grasping how the ANOVA works

As described earlier in “Surveying Student t tests,” it is only possible to run a t

test on two groups. This is why we demonstrated the t test comparing married

NHANES participants (M) to all other marital statuses (OTH). We were testing the

null hypothesis M – OTH = 0 because we were only allowed to compare two groups!

So when comparing three groups, such as married (M), never married (NM), and

all others (OTH), it’s natural to think of pairing up the groups and running three t

tests (meaning testing M – NM, then testing M – OTH, then testing NM – OTH). But

running an exhaustive set of two-group t tests increases the likelihood of Type I

error, which is where you get a statistically significant comparison that is just by

chance (for a review, read Chapter 3). And this is just with three groups!

The general rule is that N groups can be paired up in N N

/ different ways,

so in a study with six groups, you’d have 6

5 2

/ , or 15 two-group comparisons,

which is way too many.

The term one-way ANOVA refers to an ANOVA with only one grouping variable in

it. The grouping variable usually has three or more levels because if it has only

two, most analysts just do a t test. In an ANOVA, you are testing how spread out

the means of the various levels are from each other. It is not unusual for students

to be asked to calculate an ANOVA manually in a statistics class, but we skip that

here and just describe the result. One result derived from an ANOVA calculation is

expressed in a test statistic called the F ratio (designated simply as F). The F is the

ratio of how much variability there is between the groups relative to how much

variability there is within the groups. If the null hypothesis is true, and no true

difference exists between the groups (meaning the average fasting glucose in

M = NM = OTH), then the F ratio should be close to 1. Also, F’s sampling fluctua-

tions should follow the Fisher F distribution (see Chapter 24), which is actually a

family of distribution functions characterized by the following two numbers seen

in the ANOVA calculation: